Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor
نویسندگان
چکیده
The Intel® Xeon PhiTM coprocessor platform enables offload of computation from a host processor to a coprocessor that is a fully-functional Intel® Architecture CPU. This paper presents the C/C++ and Fortran compiler offload runtime for that coprocessor. The paper addresses why offload to a coprocessor is useful, how it is specified, and what the conditions for the profitability of offload are. It also serves as a guide to potential third-party developers of offload runtimes, such as a gcc-based offload compiler, ports of existing commercial offloading compilers to Intel® Xeon PhiTM coprocessor such as CAPS®, and third-party offload library vendors that Intel is working with, such as NAG® and MAGMA®. It describes the software architecture and design of the offload compiler runtime. It enumerates the key performance features for this heterogeneous computing stack, related to initialization, data movement and invocation. Finally, it evaluates the performance impact of those features for a set of directed micro-benchmarks and larger workloads.
منابع مشابه
Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor
The introduction of Intel R © Xeon Phi TM coprocessors opened up new possibilities in development of highly parallel applications. The familiarity and flexibility of the architecture together with compiler support integrated into the Intel C++ Composer XE allows the developers to use familiar programming paradigms and techniques, which are usually not suitable for other accelerated systems. It ...
متن کاملcoprocessors with a basic N-body simulation
Intel R © Xeon Phi TM coprocessors are capable of delivering more performance and better energy efficiency than Intel R © Xeon R © processors for certain parallel applications. In this paper, we investigate the porting and optimization of a test problem for the Intel Xeon Phi coprocessor. The test problem is a basic N-body simulation, which is the foundation of a number of applications in compu...
متن کاملEvaluation of Directive-based Performance Portable Programming Models
We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, and we document how ...
متن کاملUsing the pyMIC Offload Module in PyFR
PyFR is an open-source high-order accurate computational fluid dynamics solver for unstructured grids. It is designed to efficiently solve the compressible Navier-Stokes equations on a range of hardware platforms, including GPUs and CPUs. In this paper we will describe how the Python Offload Infrastructure for the Intel Many Integrated Core Architecture (pyMIC) was used to enable PyFR to run wi...
متن کاملMany Core Acceleration of the Boundary Element Method
The Intel Xeon Phi coprocessors provide an efficient tool for the acceleration of scientific codes. Contrary to the GPGPU programming, where the code has to be adapted to the hardware design of the graphics cards, the Intel’s MIC (many integrated core) technology allows for easy porting of the standard CPU code. One of the options to utilize the Xeon Phi coprocessor is to run the code on the CP...
متن کامل